Learning IE patterns: a terminology extraction perspective
نویسندگان
چکیده
The large-scale applicability of knowledge-based information access systems such as the ones based on Information Extraction techniques strongly depends on the possibility of automatically acquiring the large amount of knowledge required. However, the basic assumption of the IE paradigm, i.e. that the information need is known in advance, limits inherently its applicability since the resulting IE pattern learning algorithms are not generally conceived for the analysis of large corpora if not driven by a specific information need. Since in the terminological studies the corpora and not the information needs already drive the extraction of the knowledge, they offer many insights and mechanisms to automatically model the knowledge content of a coherent text collection. In this paper, we will present a terminological perspective to the acquisition of IE patterns based on a novel algorithm for estimating the domain relevance of the relations among domain concepts. The algorithm and the representation space will be presented. Before starting the discussion, however, we will describe the overall process of building a domain ontology out from a extensional domain model (i.e. the collected domain corpus). Finally, the results of the application of the algorithm over a large domain corpus will be presented and the resulting ontology is discussed.
منابع مشابه
On the Expressiveness of Information Extraction Patterns
Many recently reported machine learning approaches to the acquisition of information extraction (IE) patterns have used dependency trees as the basis for their pattern representations (Yangarber et al., 2000a; Yangarber, 2003; Sudo et al., 2003; Stevenson and Greenwood, 2005). While varying results have been reported for the resulting IE systems little has been reported about the ability of dep...
متن کاملEstimating Relevance and Semantic Compatibility for IE Pattern Discovery in Large Text Corpora
Pattern-based approaches for Information Extraction (IE) typically apply a pattern learner to a set of domain-specific training documents to generate extraction patterns for the IE system. This restricts the coverage of the system primarily to the expressions and language constructs that appear within the limited training data. Our research looks to the vast quantities of readily available text...
متن کاملLearning Information Extraction Patterns Using WordNet
Information Extraction (IE) systems often use patterns to identify relevant information in text but these are difficult and time-consuming to generate manually. This paper presents a new approach to the automatic learning of IE patterns which uses WordNet to judge the similarity between patterns. The algorithm starts with a small set of sample extraction patterns and uses a similarity metric, b...
متن کاملA Task-based Comparison of Information Extraction Pattern Models
Several recent approaches to Information Extraction (IE) have used dependency trees as the basis for an extraction pattern representation. These approaches have used a variety of pattern models (schemes which define the parts of the dependency tree which can be used to form extraction patterns). Previous comparisons of these pattern models are limited by the fact that they have used indirect ta...
متن کاملTransformation-Based Information Extraction Using Learned Meta-rules
Information extraction (IE) is a form of shallow text understanding that locates specific pieces of data in natural language documents. Although automated IE systems began to be developed using machine learning techniques recently, the performances of those IE systems still need to be improved. This paper describes an information extraction system based on transformation-based learning, which u...
متن کامل